No strong support for a Dunning–Kruger effect in creativity: analyses of self-assessment in absolute and relative terms

Competencies related to the evaluation of own cognitive processes, called metacognitive monitoring, are crucial as they help decide whether to persist in or desist from cognitive efforts. One of the most well-known phenomena in this context—the Dunning–Kruger effect—is that less skilled people tend to overestimate their performance. This effect has been reported for various kinds of performance including creativity. More recently, however, it has been suggested that this phenomenon could be a statistical artifact caused by the better-than-average effect and by regression toward the mean. Therefore, we examined the Dunning–Kruger effect in the context of creative thinking performance (i.e., divergent thinking ability) across two studies (Study 1: N = 425; Study 2: N = 317) and applied the classical quartile-based analysis as well as newly recommended, advanced statistical approaches: the Glejser test of heteroscedasticity and nonlinear quadratic regression. We found that the results indeed depended on the employed statistical method: While classical analyses supported the Dunning–Kruger effect across all conditions, it was not consistently supported by the more advanced statistical methods. These findings are in line with recent work challenging certain assumptions of the Dunning–Kruger effect and we discuss factors that undermine accurate self-assessments, especially in the context of creative performance.


Explorations of the Dunning Kruger effect in creative performance
The accuracy of self-assessments of creativity has been a topic of interest for some time 37,38 .However, according to our best knowledge, there have only been two studies that specifically examined the Dunning-Kruger effect in creativity.The first one used the classical quartile method 39 and found that, across the whole sample people overestimated their creative performance in three different creative thinking tasks (the Similarities Test, the Remote Associates Test, and the Product Improvement Task).However, when looking at the quartiles separately, the classical Dunning-Kruger pattern was observed: Those in the highest quartile tended to underestimate their creativity, while those in lower quartiles had a tendency towards self-inflated appraisals.Interestingly, the overestimation of the lower quartile was smaller compared to classical studies 20 .
The second study examined the Dunning Kruger effect in divergent thinking (assessed with the Alternate Uses Test) across four educational stages (preschool, elementary school, high school, and undergraduate 40 ).Taking into account the problems with the classical data analysis, the authors computed a non-hierarchical cluster analysis and identified three separate clusters: an overestimating group of unskilled and unaware participants (27.1%), an underestimating group of skilled and unaware participants (44.3%), and, most surprisingly, a group of unskilled but aware people (28.6%).The unskilled and unaware participants were mostly in the group of preschoolers, while the most skilled but unaware were among the undergraduate students, suggesting that effects were moderated by age groups.In sum, this methodological approach overcame limitations of traditional quartile-based analysis and offers new insights into developmental increases of creative metacognition with age.Still, it remains unclear whether other recent methods support a Dunning-Kruger effect with respect to creative ideation performance in adults.www.nature.com/scientificreports/

Present study
The main aim of this work was to test whether there is a Dunning-Kruger effect in creativity, as assessed by divergent thinking performance, and whether observation of this effect depends on the employed statistical methods including the classical quartile analysis as well as the Glejser test of heteroscedasticity and nonlinear (quadratic) regression.To this end, we conducted two independent studies.In the first study, we expected to confirm the Dunning-Kruger effect when using the classical analysis 39 but did not make predictions regarding the two recently recommended approaches.In the second, preregistered study, we tested the robustness of our findings by examining if results of first study replicate.For a more thorough analysis of the conditions of potential Dunning-Kruger effects, we further explored whether it plays a role how and when people make their self-assessments.Absolute self-assessments (I'm good at XY) and relative self-assessments (I'm better than others in XY) are crucial for gaining a deeper understanding of measurement by participants and more important thought to be distinct in nature 3,[41][42][43] , so we included both.Kruger and Dunning 20 proposed that high performers assume that others' skills are similar to their own, so they may underestimate themselves when using a relative scale, but not when making absolute estimates.On the other hand, low performers are likely to overestimate themselves in both cases 44 .Furthermore, we decided to ask participants to assess their performance before and after the tasks.It can be assumed that post-task assessments would be more accurate as the task itself provides valuable feedback 14,24,45,46 .Nevertheless, available meta-analytic findings showed little to no difference in accuracy based on whether self-assessments were made before or after performance 6,[47][48][49] .By incorporating relative and absolute assessments, pre-and post-task assessments and applying both classical and recent statistical approach to test the Dunning-Kruger effect, we hoped to realize a thorough test of Dunning-Kruger effects in creativity, and, more generally, gain deeper insights into the accuracy of creative metacognition.

Method
We conducted two online studies assessing divergent thinking (DT) ability with the established alternate uses task 50 and asking participants to estimate their performance before and after the task.Given the similar methods, we describe both studies together in the following.Note, however, that both studies were independent, and the second study was preregistered (https:// aspre dicted.org/ ey3bv.pdf) and realized only after completion of the first study.

Sample
Both studies excluded data from participants who either a) missed any of the attention checks (i.e., two questions intermixed with the other questionnaire items), b) completed the study overly fast (< 1400 s), or c) did not generate at least one response to each DT task.Study 1 had a final sample of 425 participants (68.5% female, 31.1% male, 0.5% other), two-thirds of whom were University students (61.9%), and with a mean age of 29 years (M = 28.68,SD = 11.29,range: 18-69 years).In study 2, the final sample consisted of 317 participants (66.2% female, 33.1% male, 0.6% other) 59% of which were University students, and with a mean age of 30 years (M = 29.81,SD = 13.13,range = 17-73 years).According to Gignac and Zajenkowski (2020), 200 is the minimum sample size to be able to interpret quadratic regression analyses and Glejser correlations.For the classical analysis, our sample size ensures at least 80% power to detect within-quartile effects of d ≥ ± 0.4 (based on the smallest quartile with n = 73).

Procedure
Participants completed all measures online via LimeSurvey.They first read and confirmed the informed consent form.In study 1 (conducted November-December 2021) and study 2 (conducted November-December 2022), divergent thinking tasks were completed.Before and after the tasks, participants provided self-assessments of their creativity.As these studies were part of a larger research project, participants completed additional tasks that were not in the focus of this work and thus are not reported here.The study was conducted according to the guidelines of the Declaration of Helsinki and the procedure had been approved by the ethics committee of University of Graz (GZ.39/146/63 ex 2020/21).

Materials
The data, code, and supplementary material with additional analysis are available on the OSF (https:// osf.io/ 3kud6/).

Creativity
Creative thinking performance was measured with divergent thinking tasks, precisely alternate uses tasks (AUT) which ask to generate creative uses for everyday objects 50 .In study 1, participants completed four AUT tasks (DT items were: brick, car tire, pen, can), two of which (randomly either the first or last two) were performed under "be creative" and "be fluent" instructions (i.e., asking to focus on the creative quality or quantity of responses; 51 , respectively.)For the assessment of DT creativity, we focused on the two items with "be creative" instructions.Five independent raters rated all responses on a scale ranging from zero (not creative at all) to four (very creative).Inter-rater reliability was high for all items, ICC between 0.82 and 0.83; internal consistency was decent for just two items with Cronbach's alpha = 0.53.In study 2, participants performed three AUT tasks (DT items were: brick, car tire, pen) under "be creative" instructions and six independent raters rated all responses on the same scale.Inter-rater reliability was high for all tasks, ICC between 0.80 and 0.83; internal consistency across the three items was satisfactory with Cronbach's alpha = 0.69.In both studies, we computed the max-3 score of www.nature.com/scientificreports/DT creativity (i.e., the average of the three most creative responses according to average ratings) to address the potential confound between the creativity and fluency of responses 17,52 .As fluency (i.e., the number of generated uses) is another measure often included in creativity research but less central to the research questions at hand, we additionally report examinations of Dunning-Kruger effects for self-assessed versus actual fluency (under the "be fluent" condition; study 1) in the appendix (https:// osf.io/ 3kud6/).

Self-assessed creativity
Before the DT tasks, participants indicated once how well they expect to perform in the task in general (absolute pre-task self-assessment) by reporting their agreement to the statement "I can come up with creative ideas" on a slider scale from 0 (do not agree at all) to 100 (totally agree).Next, they indicated how well they expect to do compared to others (relative pre-task self-assessment) on a slider scale from 0% (everybody else will have more creative ideas than me) to 100% (I will have more creative ideas than everybody else).After completing all DT tasks, participants responded to analogous questions in the past tense (absolute and relative post-task self-assessment).

Statistical analyses
We tested Dunning-Kruger effects under different conditions, including for (1) pre-and post-task self-assessments of creative performance, (2) absolute and relative self-assessments, and (3) using three different statistical approaches.The first test of the Dunning-Kruger effect was based on the classical method used by the original authors 20 .In line with them, we transformed our performance measures into percentile ranks to be able to compare them directly with the self-assessments (range: 0-100).In case of ties, we assigned each tied element the average rank (i.e., all participants with the same raw score also have the same rank).We then split our samples into quartiles based on their performance and compared self-assessments and performance (within-subjects factor measure) for the quartiles (between-subjects factor quartile) in ANOVAs 20 .We interpreted results as supporting a Dunning-Kruger effect if the ANOVA resulted in a significant interaction and the pairwise comparisons indicated that the lowest quartile showed the largest positive difference between self-assessment and performance (overestimation).
As a second way of testing the Dunning-Kruger effect, we computed the Glejser correlation 34 , a measure of the heteroscedasticity of residuals 33 .To calculate Glejser correlations, we (1) conducted linear regressions predicting self-assessed creativity from objectively measured creativity, (2) transformed the resulting residuals into absolute values, and (3) correlated these absolute residuals with objectively measured creativity.Here, a significantly negative correlation would be indicative of a Dunning-Kruger effect.
As a third test of the Dunning-Kruger effect, we conducted quadratic regressions, which Gignac 35 argued to be less ambiguous than the Glejser correlation.In hierarchical regressions, we first entered a linear performance term into a model explaining self-assessment and then added a quadratic performance term.Following the recommendations by Gignac and Zajenkowski 33 , we considered these analyses as supporting a Dunning-Kruger effect if the ΔR 2 between step one and two as well as the sr 2 of the quadratic term were significant.Of note, since the quadratic term is the only predictor entered in the second step of these analyses, its sr 2 is identical to the ΔR 2 between step one and two.For this reason, we only report ΔR 2 .
As neither the Glejser test nor quadratic regression require a direct numerical comparison of self-assessment and performance, we conducted these analyses based on untransformed data (rather than percentiles) to preserve a maximum amount of information and higher scale of measurement 36 .As some readers might be interested in a direct comparison of the results of our classical and alternative analyses, we provide percentile-based results on the latter in the appendix.In short, these additional analyses showed virtually identical results in the Glejser tests and yielded no support for Dunning-Kruger effects in quadratic regressions.To counter potential violations of distributional assumptions, we based our interpretations on 95% bootstrapped confidence intervals based on 2000 samples.

Descriptive statistics and intercorrelations
Table 1 contains the descriptive statistics and intercorrelations of all main variables for both studies.In both samples, the different creativity self-assessment measures were highly correlated with each other.Correlations between self-assessments and measured creativity were small too moderate, being descriptively somewhat higher in post-task assessments compared to pre-task assessments.We further report effect sizes (d) for statistical comparisons between mean self-assessed and measured creativity percentile (i.e., absolute accuracy).Interestingly, average differences between self-assessments and performance were only minor and, in most cases, (especially for relative self-assessments compared to others, i.e., relative) negative, meaning that people tended to underestimate themselves slightly.

Classical analyses
Classical analyses showed support for Dunning-Kruger effects in all four conditions and across both studies.The relevant interaction between measure and quartile was significant in all ANOVAs (see Table 2).Pairwise comparisons also showed a pattern indicative of a Dunning-Kruger effects (see Table 3 and Fig. 1): People in the lowest quartile overestimated themselves the most.In five out of eight analyses, the second quartile also showed significant albeit considerably lower overestimation.Those in the highest quartile-and to a lesser degree also those in the second-to-highest quartile-were prone to underestimate themselves.www.nature.com/scientificreports/

Glejser test of heteroscedasticity
On a merely descriptive level, we observed small negative correlations between absolute residuals and DT performance across all conditions (see Fig. 2), but they reached statistical significance only in specific cases (see Table 4).
In study 1, we found support for Dunning-Kruger effects in relative pre-and post-task self-assessments but not in absolute pre-or post-task self-assessments.While the Glejser correlation for relative post-task self-assessments was also significant in study 2, the one for relative pre-task self-assessments was not.Moreover, in study 2 absolute pre-task self-assessments also showed a significant negative Glejser correlation, while the respective correlation for absolute post-task self-assessments was still not significant.Thus, only one of four conditions showed consistent Dunning-Kruger effects in both samples (relative post-task self-assessment).

Discussion
Creative challenges are often ill-defined, with unclear evaluation criteria and infinite possible solutions.Therefore, self-assessment may be vital in regulating the creative process.At the same time, these specifics of creative tasks could make self-assessments in this area particularly hard, potentially leading to lower accuracy and higher biases [11][12][13] .Past research indicated that creative people are doubly skilled in not only generating more creative ideas but also being able to judge the creativity of ideas more accurately 10,[17][18][19] .Still, the question remains whether creative people are also more accurate in judging their creative performance overall, that is, if they have higher creative metacognitive monitoring skills at the performance-level, not just at the response-level (idea evaluation) 14 .Or put differently, do less creative people judge their creative performance less accurately, and hence, does the Dunning-Kruger effect extend to creativity?As available findings are not fully conclusive 39,40 and may  www.nature.com/scientificreports/partly depend on the employed test approach 35 , this work aimed for a comprehensive test of Dunning-Kruger in creative ideation performance with different methods.Contrary to the assumption that self-assessments in the context of creativity might be particularly challenging, we found that metacognitive monitoring accuracy at performance level aligned well with past results from other domains 5,6 with correlations between self-assessment and creative performance ranging between 0.1 and 0.3 (see Table 1).Of note, these correlations are still low enough to question people's self-insight when it comes to creative performance.At least descriptively, creative self-assessments made after the task were slightly more accurate than those made before it.Thus, gaining experience with the task might help people calibrate their self-assessments to their performance, which also has been previously shown, although not consistently 24,45,46 , but see 6,[47][48][49] .
When analyzing differences between self-assessments and performance, we found small but statistically significant underestimation effects for the majority of conditions.This result is surprising, considering previous findings in creativity research 39 and the general tendency of people to judge themselves as above average 29,30 .But it is worth noting that a small number of studies also reported underestimation of performance in other areas like intelligence 24,36,53 .In fact, creative people tend to underestimate the originality of their ideas 16 , which may in consequence also let them underestimate their overall performance 54 .Notably, self-assessments were consistently below average when people compared themselves to others, whereas self-assessments were sometimes above average when people assessed their creative performance in general (relative versus absolute self-assessment).This pattern suggests that self-assessments are specific to the type of question asked, allowing people to view their performance as creative, yet potentially less creative compared to others 40 .In sum, we observed a below-average   www.nature.com/scientificreports/effect when people compare their creative performance to others, which may be due to inherent uncertainty of creative processes (compared to other domains where people know when they solved a task correctly) which eventually decreases people's confidence.
Our main research question was if uncreative people are particularly inaccurate in their performance assessment as assumed by the Dunning-Kruger effect.When we considered the classical quartile analysis employed by the original authors 20 , we found consistent support for the Dunning-Kruger effect across two samples and four self-assessment measures 39 : The lowest quartile overestimated themselves the most.Additionally, the classical analyses revealed significant underestimation in the highest performance quartile across all conditions and not just for relative self-assessments.This latter finding stands in contrast with the assumption that the negative bias of high performers is mainly due to the false consensus effect (i.e., overestimation of the others instead of underestimation of oneself 20 ).Notably, underestimation effects of high-performers were descriptively even higher for relative self-assessments (ds between − 1.92 and − 2.25) compared to absolute self-assessments (ds between − 1.55 and − 1.79).This could mean that high performers both underestimate themselves and overestimate others.Still, overestimation for low performers and underestimation for high performers could also be simply due to regression to mean effects.Therefore, we used additional statistical tests that are potentially less affected by regression-to-the mean effects.
In line with research from other domains 33,35,36 , the consistent Dunning-Kruger effects observed in the classical analyses stood in stark contrast to the results obtained when employing the statistical alternatives suggested by Gignac and Zajenkowski 33 .When we applied the Glejser test of heteroscedasticity 34 , only relative self-assessments made after the task showed a consistently negative Glejser correlation across both samples.It can be conceded that all eight Glejser correlations were negative at a descriptive level.This may point to a small but consistent effect of less creative people being minimally less accurate to be established in more powerful analyses (n > 780 is needed to establish r = 0.1 with a power of 0.80), albeit with little practical significance.The support for the Dunning-Kruger effect was even weaker in quadratic regression analyses: Only one result was in favor of the Dunning-Kruger effect (relative self-assessments made before the task in study 1), but this was not replicated in study 2, suggesting that self-assessments and performance do not correlate consistently higher in more creative people.Of note, the results of our secondary analyses based on DT fluency of AUT responses (see appendix) were mostly in line with findings on DT creativity.In sum, our work adds to the growing body of literature finding only limited support for the Dunning-Kruger effect beyond regression to the mean effects 23,26 .
This study has several strengths, including the use of two large samples, relative and absolute assessments, pre-and post-task assessments, and classical and recent statistical approaches which enabled a comprehensive test of the Dunning-Kruger effect in the context of creative performance.However, it is essential to consider some limitations when interpreting the results.First, the samples involved a large proportion of university students, so caution should be taken when generalizing the results to other groups-maybe creative professionals can judge their creativity more accurately.Second, the data was collected online, which implies lower experimental control compared to lab experiments, but also guaranteed highest anonymity which could support honest self-assessments.Third, we assessed creative performance only in the context of divergent thinking ability as assessed with the AUT.While this is arguably the most dominant approach for assessing creative ability in creativity research 55 , the Dunning-Kruger effect could further be explored with other, more complex creativity tasks, and potentially domain-specific creativity measures.Future research thus may aim to replicate our findings in a more diverse sample and controlled context but also look beyond performance measures to predict the creative metacognitive monitoring accuracy and consider other relevant variables like personality or previous experiences with tasks.We further need to acknowledge that the reliability of DT assessments was modest and that this was likely due to the low number of DT items.While using 2-3 items for DT assessment is common practice 56 , it may not be enough to ensure highly reliable DT creativity measures-an issue that was not apparent to us as many previous studies (76%) failed to report the reliability of their DT assessments 56 .It is well known that (low) reliability limits the potential strength of correlations (in this case between DT and self-assessments) and past work showed that reliability tends to be even lower at the extremes of the distribution 57,58 .If associations between self-assessments and performance are lower at the extremes (i.e., in line with a DK effect), this might partly simply be a function of lower reliability 57 .It is unclear whether this effect is aggravated by an overall low reliability.However, future studies focused on DT, the DK effect, and particularly their interaction would benefit from paying closer attention to reliability.
In conclusion, this work found no strong support for a Dunning-Kruger effect in creativity.The systematic overestimation/underestimation of own performance in low versus high performers may be largely due to regression to mean effects.Interestingly, the creative domain may stand out in not being subject to common above-average biases but, if anything, rather exhibit a below-average effect.More generally, people appear to have only rather limited insight in their creative performance level-some overestimate, others underestimate themselves, and yet others seem to be good judges of their creative performance.This raises the question of what other factors might predict an accurate assessment of one's own creativity if it is not creative performance itself.While being more creative supports discernment in judging the creativity of one's own and others' ideas, it may not come with higher accuracy in judging own creative task performance.So, creative people may only be doubly blessed, implying at least only a dual rather than a triple-burden for less creative people. https://doi.org/10.1038/s41598-024-61042-1 https://doi.org/10.1038/s41598-024-61042-1 https://doi.org/10.1038/s41598-024-61042-1

Figure 1 .
Figure 1.Quartile-based Tests: Self-Assessed and Measured Creativity for DT Creativity Quartiles.Colored dots indicate jittered participant-level data; black dots with error bars indicate means with 95% confidence intervals.DT divergent thinking performance.SA self-assessment.

Figure 2 .
Figure 2. Glejser Correlations of Heteroscedasticity.Lines and shaded areas around them represent linear associations with 95% confidence bands.DT divergent thinking performance.SA self-assessment.

Table 4 .
The Glejser test of heteroscedasticity.SA = Self-assessed creative performance; Absolute/ Relative = Self-assessment in general/compared to others; Pre/Post = Self-assessments assessed pre/post DT task performance.n Study 1 = 425.n Study 2 = 317.Values in brackets are 95% bias-corrected and accelerated confidence intervals based on 2000 bootstrap samples.

Figure 3 .
Figure 3. Nonlinear regression between Measured and Self-Assessed Creativity.Lines and shaded areas around them represent quadratic lines of best fit with 95% confidence bands.DT divergent thinking performance.SA Self-assessment.

Table 1 .
Descriptive Statistics, Differences between, and Intercorrelations of Self-Assessed and Measured Creativity (Study 1).Measured divergent thinking (DT) creativity is given as raw and percentile values (%); SA = Self-assessed creative performance; Absolute/Relative = Self-assessment in general/compared to others; Pre/Post = Self-assessments pre/post DT task performance.ds are Cohen's ds for the differences between selfassessment and performance percentile (positive values = overestimation) with * indicating significance in twotailed t-tests (p < .05).At n = 425 (study 1), all r ≥ 0.09 are significant at p < 0.05 and all r ≥ 0.16 are significant at p < 0.001.At n = 317 (study 2), all r ≥ 0.11 are significant at p < 0.05 and all r ≥ 0.18 are significant at p < 0.001.